Reliably Capture Local Clusters in Noisy Domains From Parallel Universes

نویسندگان

  • Frank Höppner
  • Mirko Böttcher
چکیده

When seeking for small local patterns it is very intricate to distinguish between incidental agglomeration of noisy points and true local patterns. We propose a new algorithm [2] that addresses this problem by exploiting temporal information which is contained in most business data sets. The algorithm enables the detection of local patterns in noisy data sets more reliable compared to the case when the temporal information is ignored. This is achieved by making use of the fact that noise does not reproduce its incidental structure but even small patterns do. In particular, we developed a method to track clusters over time based on an optimal match of data partitions between time periods. Using the terminology of parallel universes in [1], our approach is characterised as follows:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Matching Partitions over Time to Reliably Capture Local Clusters in Noisy Domains

When seeking for small clusters it is very intricate to distinguish between incidental agglomeration of noisy points and true local patterns. We present the PAMALOC algorithm that addresses this problem by exploiting temporal information which is contained in most business data sets. The algorithm enables the detection of local patterns in noisy data sets more reliable compared to the case when...

متن کامل

Fuzzy Clustering in Parallel Universes with Noise Detection

We present an extension of the fuzzy c-Means algorithm that operates on different feature spaces, so-called parallel universes, simultaneously and also incorporates noise detection. The method assigns membership values of patterns to different universes, which are then adopted throughout the training. This leads to better clustering results since patterns not contributing to clustering in a uni...

متن کامل

Lernen in parallelen Universen

Classical data mining techniques are almost always based on a unique object representation, which is often realized as a high-dimensional attribute vector per object. In many application domains, however, the objects to be analyzed (molecules, 3D models, processes) can be described easily in various ways, which leads to a variety of object representations: so called Parallel Universes. This the...

متن کامل

Subspace outlier mining in large multimedia databases

Increasingly large multimedia databases in life sciences, ecommerce, or monitoring applications cannot be browsed manually, but require automatic knowledge discovery in databases (KDD) techniques to detect novel and interesting patterns. Clustering, aims at grouping similar objects into clusters, separating dissimilar objects. Density-based clustering has been shown to detect arbitrarily shaped...

متن کامل

Infall Regions of Galaxy Clusters

In hierarchical clustering, galaxy clusters accrete mass through the aggregation of smaller systems. Thus, the velocity field of the infall regions of clusters contains significant random motion superimposed on radial infall. Because the purely spherical infall model does not predict the amplitude of the velocity field correctly, methods estimating the cosmological density parameter Ω0 based on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007